home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Freeware 2002 November
/
SGI Freeware 2002 November - Disc 2.iso
/
dist
/
fw_glimpse.idb
/
usr
/
freeware
/
catman
/
u_man
/
cat1
/
agrep.Z
/
agrep
Wrap
Text File
|
1997-09-09
|
19KB
|
463 lines
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
NNNNAAAAMMMMEEEE
agrep - search a file for a string or regular expression,
with approximate matching capabilities
SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
aaaaggggrrrreeeepppp [ ----####ccccddddeeeehhhhiiiikkkkllllnnnnppppssssttttvvvvwwwwxxxxBBBBDDDDGGGGIIIISSSS ] _p_a_t_t_e_r_n [ -f _p_a_t_t_e_r_n_f_i_l_e ] [
_f_i_l_e_n_a_m_e... ]
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
aaaaggggrrrreeeepppp searches the input _f_i_l_e_n_a_m_e_s (standard input is the
default, but see a warning under LIMITATIONS) for records
containing strings which either _e_x_a_c_t_l_y or _a_p_p_r_o_x_i_m_a_t_e_l_y
match a pattern. A record is by default a line, but it can
be defined differently using the -d option (see below).
Normally, each record found is copied to the standard
output. Approximate matching allows finding records that
contain the pattern with several errors including
substitutions, insertions, and deletions. For example,
Massechusets matches Massachusetts with two errors (one
substitution and one insertion). Running aaaaggggrrrreeeepppp -2
Massechusets foo outputs all lines in foo containing any
string with at most 2 errors from Massechusets.
aaaaggggrrrreeeepppp supports many kinds of queries including arbitrary
wild cards, sets of patterns, and in general, regular
expressions. See PATTERNS below. It supports most of the
options supported by the ggggrrrreeeepppp family plus several more (but
it is not 100% compatible with grep). For more information
on the algorithms used by agrep see Wu and Manber, "Fast
Text Searching With Errors," Technical report #91-11,
Department of Computer Science, University of Arizona, June
1991 (available by anonymous ftp from cs.arizona.edu in
agrep/agrep.ps.1), and Wu and Manber, "Agrep -- A Fast
Approximate Pattern Searching Tool", To appear in USENIX
Conference 1992 January (available by anonymous ftp from
cs.arizona.edu in agrep/agrep.ps.2).
As with the rest of the ggggrrrreeeepppp family, the characters `$$$$',
`^'''',,,, `****', `[[[[',,,, `]]]]',,,, `^^^^', `||||', `((((', `))))', `!!!!', and `\\\\' can
cause unexpected results when included in the _p_a_t_t_e_r_n, as
these characters are also meaningful to the shell. To avoid
these problems, one should always enclose the entire pattern
argument in single quotes, i.e., 'pattern'. Do not use
double quotes (").
When aaaaggggrrrreeeepppp is applied to more than one input file, the name
of the file is displayed preceding each line which matches
the pattern. The filename is not displayed when processing
a single file, so if you actually want the filename to
appear, use ////ddddeeeevvvv////nnnnuuuullllllll as a second file in the list.
OOOOPPPPTTTTIIIIOOOONNNNSSSS
PPPPaaaaggggeeee 1111 ((((pppprrrriiiinnnntttteeeedddd 11111111////3333////99995555))))
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
----# # is a non-negative integer (at most 8) specifying the
maximum number of errors permitted in finding the
approximate matches (defaults to zero). Generally,
each insertion, deletion, or substitution counts as one
error. It is possible to adjust the relative cost of
insertions, deletions and substitutions (see -I -D and
-S options).
----cccc Display only the count of matching records.
----dddd ''''_d_e_l_i_m''''
Define _d_e_l_i_m to be the separator between two records.
The default value is '$', namely a record is by default
a line. _d_e_l_i_m can be a string of size at most 8 (with
possible use of ^ and $), but not a regular expression.
Text between two _d_e_l_i_m's, before the first _d_e_l_i_m, and
after the last _d_e_l_i_m is considered as one record. For
example, -d '$$' defines paragraphs as records and -d
'^From ' defines mail messages as records. aaaaggggrrrreeeepppp
matches each record separately. This option does not
currently work with regular expressions.
----eeee _p_a_t_t_e_r_n
Same as a simple _p_a_t_t_e_r_n argument, but useful when the
_p_a_t_t_e_r_n begins with a `----'.
----ffff _p_a_t_t_e_r_n_f_i_l_e
_p_a_t_t_e_r_n_f_i_l_e contains a set of (simple) patterns. The
output is all lines that match at least one of the
patterns in _p_a_t_t_e_r_n_f_i_l_e. Currently, the -f option works
only for exact match and for simple patterns (any meta
symbol is interpreted as a regular character); it is
compatible only with -c, -h, -i, -l, -s, -v, -w, and -x
options. see LIMITATIONS for size bounds.
----hhhh Do not display filenames.
----iiii Case-insensitive search - e.g., "A" and "a" are
considered equivalent.
----kkkk No symbol in the pattern is treated as a meta
character. For example, agrep -k 'a(b|c)*d' foo will
find the occurrences of a(b|c)*d in foo whereas agrep
'a(b|c)*d' foo will find substrings in foo that match
the regular expression 'a(b|c)*d'.
----llll List only the files that contain a match. This option
is useful for looking for files containing a certain
pattern. For example, " agrep -l 'wonderful' * " will
list the names of those files in current directory that
contain the word 'wonderful'.
Page 2 (printed 11/3/95)
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
----nnnn Each line that is printed is prefixed by its record
number in the file.
----pppp Find records in the text that contain a supersequence
of the pattern. For example,
aaaaggggrrrreeeepppp ----pppp DDDDCCCCSSSS ffffoooooooo wwwwiiiillllllll mmmmaaaattttcccchhhh """"DDDDeeeeppppaaaarrrrttttmmmmeeeennnntttt ooooffff CCCCoooommmmppppuuuutttteeeerrrr
SSSScccciiiieeeennnncccceeee....""""
----ssss Work silently, that is, display nothing except error
messages. This is useful for checking the error
status.
----tttt Output the record starting from the end of _d_e_l_i_m to
(and including) the next _d_e_l_i_m. This is useful for
cases where _d_e_l_i_m should come at the end of the record.
----vvvv Inverse mode - display only those records that _d_o _n_o_t
contain the pattern.
----wwww Search for the pattern as a word - i.e., surrounded by
non-alphanumeric characters. The non-alphanumeric mmmmuuuusssstttt
surround the match; they cannot be counted as errors.
For example, aaaaggggrrrreeeepppp -w -1 car will match cars, but not
characters.
----xxxx The pattern must match the whole line.
----yyyy Used with -B option. When -y is on, agrep will always
output the best matches without giving a prompt.
----BBBB Best match mode. When -B is specified and no exact
matches are found, agrep will continue to search until
the closest matches (i.e., the ones with minimum number
of errors) are found, at which point the following
message will be shown: "the best match contains x
errors, there are y matches, output them? (y/n)" The
best match mode is not supported for standard input,
e.g., pipeline input. When the -#, -c, or -l options
are specified, the -B option is ignored. In general,
-B may be slower than -#, but not by very much.
----DDDD_k Set the cost of a deletion to _k (_k is a positive
integer). This option does not currently work with
regular expressions.
----GGGG Output the files that contain a match.
----IIII_k Set the cost of an insertion to _k (_k is a positive
integer). This option does not currently work with
regular expressions.
----SSSS_k Set the cost of a substitution to _k (_k is a positive
Page 3 (printed 11/3/95)
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
integer). This option does not currently work with
regular expressions.
PPPPAAAATTTTTTTTEEEERRRRNNNNSSSS
_a_g_r_e_p supports a large variety of patterns, including simple
strings, strings with classes of characters, sets of
strings, wild cards, and regular expressions.
SSSSttttrrrriiiinnnnggggssss
any sequence of characters, including the special
symbols `^' for beginning of line and `$' for end of
line. The special characters listed above ( `$$$$', `^'''',,,,
`****', `[[[[',,,, `^^^^', `||||', `((((', `))))', `!!!!', and `\\\\' ) should be
preceded by `\' if they are to be matched as regular
characters. For example, \^abc\\ corresponds to the
string ^abc\, whereas ^abc corresponds to the string
abc at the beginning of a line.
CCCCllllaaaasssssssseeeessss ooooffff cccchhhhaaaarrrraaaacccctttteeeerrrrssss
a list of characters inside [] (in order) corresponds
to any character from the list. For example, [a-ho-z]
is any character between a and h or between o and z.
The symbol `^' inside [] complements the list. For
example, [^i-n] denote any character in the character
set except character 'i' to 'n'. The symbol `^' thus
has two meanings, but this is consistent with egrep.
The symbol `.' (don't care) stands for any symbol
(except for the newline symbol).
BBBBoooooooolllleeeeaaaannnn ooooppppeeeerrrraaaattttiiiioooonnnnssss
aaaaggggrrrreeeepppp supports an `and' operation `;' and an `or'
operation `,', but not a combination of both. For
example, 'fast;network' searches for all records
containing both words.
WWWWiiiilllldddd ccccaaaarrrrddddssss
The symbol '#' is used to denote a wild card. #
matches zero or any number of arbitrary characters.
For example, ex#e matches example. The symbol # is
equivalent to .* in egrep. In fact, .* will work too,
because it is a valid regular expression (see below),
but unless this is part of an actual regular
expression, # will work faster.
CCCCoooommmmbbbbiiiinnnnaaaattttiiiioooonnnn ooooffff eeeexxxxaaaacccctttt aaaannnndddd aaaapppppppprrrrooooxxxxiiiimmmmaaaatttteeee mmmmaaaattttcccchhhhiiiinnnngggg
any pattern inside angle brackets <> must match the
text exactly even if the match is with errors. For
example, <mathemat>ics matches mathematical with one
error (replacing the last s with an a), but
mathe<matics> does not match mathematical no matter how
many errors we allow.
Page 4 (printed 11/3/95)
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
RRRReeeegggguuuullllaaaarrrr eeeexxxxpppprrrreeeessssssssiiiioooonnnnssss
The syntax of regular expressions in aaaaggggrrrreeeepppp is in
general the same as that for eeeeggggrrrreeeepppp. The union
operation `|', Kleene closure `*', and parentheses ()
are all supported. Currently '+' is not supported.
Regular expressions are currently limited to
approximately 30 characters (generally excluding meta
characters). Some options (-d, -w, -f, -t, -x, -D, -I,
-S) do not currently work with regular expressions.
The maximal number of errors for regular expressions
that use '*' or '|' is 4.
EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS
agrep -2 -c ABCDEFG foo
gives the number of lines in file foo that contain
ABCDEFG within two errors.
agrep -1 -D2 -S2 'ABCD#YZ' foo
outputs the lines containing ABCD followed, within
arbitrary distance, by YZ, with up to one additional
insertion (-D2 and -S2 make deletions and substitutions
too "expensive").
agrep -5 -p abcdefghij /usr/dict/words
outputs the list of all words containing at least 5 of
the first 10 letters of the alphabet _i_n _o_r_d_e_r. (Try
it: any list starting with academia and ending with
sacrilegious must mean something!)
agrep -1 'abc[0-9](de|fg)*[x-z]' foo
outputs the lines containing, within up to one error,
the string that starts with abc followed by one digit,
followed by zero or more repetitions of either de or
fg, followed by either x, y, or z.
agrep -d '^From ' 'breakdown;internet' mbox
outputs all mail messages (the pattern '^From '
separates mail messages in a mail file) that contain
keywords 'breakdown' and 'internet'.
agrep -d '$$' -1 '<word1> <word2>' foo
finds all paragraphs that contain word1 followed by
word2 with one error in place of the blank. In
particular, if word1 is the last word in a line and
word2 is the first word in the next line, then the
space will be substituted by a newline symbol and it
will match. Thus, this is a way to overcome separation
by a newline. Note that -d '$$' (or another delim
which spans more than one line) is necessary, because
otherwise agrep searches only one line at a time.
agrep '^agrep' <this manual>
Page 5 (printed 11/3/95)
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
outputs all the examples of the use of agrep in this
man pages.
SSSSEEEEEEEE AAAALLLLSSSSOOOO
eeeedddd(1), eeeexxxx(1), ggggrrrreeeepppp(1V), sssshhhh(1), ccccsssshhhh(1).
BBBBUUUUGGGGSSSS////LLLLIIIIMMMMIIIITTTTAAAATTTTIIIIOOOONNNNSSSS
Any bug reports or comments will be appreciated! Please mail
them to sw@cs.arizona.edu or udi@cs.arizona.edu
Regular expressions do not support the '+' operator (match 1
or more instances of the preceding token). These can be
searched for by using this syntax in the pattern:
'_p_a_t_t_e_r_n((((_p_a_t_t_e_r_n))))****'
(search for strings containing one instance of the pattern,
followed by 0 or more instances of the pattern).
The following can cause an infinite loop: aaaaggggrrrreeeepppp pattern * >
output_file. If the number of matches is high, they may be
deposited in output_file before it is completely read
leading to more matches of the pattern within output_file
(the matches are against the whole directory). It's not
clear whether this is a "bug" (grep will do the same), but
be warned.
The maximum size of the _p_a_t_t_e_r_n_f_i_l_e is limited to be 250Kb,
and the maximum number of patterns is limited to be 30,000.
Standard input is the default if no input file is given.
However, if standard input is keyed in directly (as opposed
to through a pipe, for example) agrep may not work for some
non-simple patterns.
There is no size limit for simple patterns. More
complicated patterns are currently limited to approximately
30 characters. Lines are limited to 1024 characters.
Records are limited to 48K, and may be truncated if they are
larger than that. The limit of record length can be changed
by modifying the parameter Max_record in agrep.h.
DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
Exit status is 0 if any matches are found, 1 if none, 2 for
syntax errors or inaccessible files.
AAAAUUUUTTTTHHHHOOOORRRRSSSS
Sun Wu and Udi Manber, Department of Computer Science,
University of Arizona, Tucson, AZ 85721.
{sw|udi}@cs.arizona.edu.
Page 6 (printed 11/3/95)
AAAAGGGGRRRREEEEPPPP((((llll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((JJJJaaaannnn 11117777,,,, 1111999999992222)))) AAAAGGGGRRRREEEEPPPP((((llll))))
Page 7 (printed 11/3/95)